Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
نویسنده
چکیده
This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all. The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies). An algorithm for the automatic recognition of structural patterns is then pre-
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملInvestigation and Analysis of Empirical Field Seismic Damage to Bottom Frame Seismic Wall Masonry Structure
To understand the seismic damage characteristics of bottom frame-wall buildings and to study their seismic performance, the seismic damage outcomes of bottom frame-wall structures in Du jiang weir by the Wen chuan earthquake on May 12,2008, China were field observed and studied. The assesment of the seismic damage to the bottom frame-wall masonry in Du jiang weir urban showed that the damage t...
متن کاملSupercritical Fluid Extraction of Carotenoid from Microalgae with Projected Thermodynamic Models (RESEARCH NOTE)
In this study, two thermodynamic models (regular solution theory and equation of state) were applied to obtain carotenoid solubility in the supercritical carbon dioxide solvent. Theoretical data obtained from the models were compared with the experimental data extracted from a published paper. The use of equation of state as an empirical correlation for collating and predicting liquidliquid and...
متن کاملAn Energy Based Adaptive Pushover Analysis for Nonlinear Static Procedures
Nonlinear static procedure (NSP) is a common technique to predict seismic demands on various building structures by subjecting a monotonically increasing horizontal loading (pushover) to the structure. Therefore, the pushover analysis is an important part of each NSP. Accordingly, the current paper aims at investigating the efficiencyof various algorithms of lateral load patterns applied to the...
متن کاملA design theory for software engineering
Context: Software Engineering is a discipline that has been shaped by over 50 years of practice. Many have argued that its theoretical basis has been slow to develop and that, in fact, a substantial theory of Software Engineering is still lacking. Objective: We propose a design theory for Software Engineering as a contribution to the debate. Having done this, we extend it to a design theory for...
متن کامل